Process and arrangement for encoding video pictures

ABSTRACT

Today&#39;s video codecs require the intelligent choice between many coding options. This choice can efficiently be done using Lagrangian coder control. But Lagrangian coder control only provides results given a particular Lagrange parameter, which correspond to some unknown transmission rate. On the other hand, rate control algorithms provide coding results at a given bitrate but without the optimization performance of Lagrangian coder control. The combination of rate control and Lagrangian optimization for hybrid video coding is investigated. A new approach is suggested to incorporate these two known methods into the video coder control using macroblock mode decision and quantizer adaptation. The rate-distortion performance of the proposed approach is validated and analyzed via experimental results. It is shown that for most bit-rates the combined rate control and Lagrangian optimization producing a constant number of bits per picture achieves similar rate distortion performance as the constant slope case only using Lagrangian optimization.

This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2004/009391, filed Aug. 23, 2004, which was published in accordance with PCT Article 21(2) on Mar. 10, 2005 in English and which claims the benefit of European patent application No. 03090282.9, filed Sep. 3, 2003.

BACKGROUND OF THE INVENTION

(a) Field of the Invention

The invention relates to a process and an arrangement for encoding video pictures. Particularly it concerns the operational control of the macroblock layer, and it is useable for encoding video sequences.

(b) Description of Related Art

The specification of most block-based hybrid video coding standards including MPEG-2 Visual [1], H.263 [2], MPEG-4 Visual [3] and H.264/AVC [4] provide only the bit-stream syntax and the decoding process in order to allow interoperability. References listed as numbers within the specification between square brackets are identified at the end of the detailed description of the application under the heading “References”. The encoding process is left out of scope to permit flexible implementations. However, the operational control of the source encoder is a key problem in video compression. For the encoding of a video source, a variety of coding parameters such as quantization parameters, macroblock and block modes, motion vectors, and quantized transform coefficients has to be determined. On one hand, the chosen values determine the rate-distortion efficiency of the produced bit-stream for a given decoder. On the other hand, these parameters also determine the required transmission rate and decoding delay.

BRIEF SUMMARY OF THE INVENTION

In real-time video communications over a fixed-rate channel, the general goal of the operational coder control is to obtain the best possible video quality while keeping the given conditions on transmission rate and decoding delay. Due to the large parameter space involved, this is a non-trivial problem. Furthermore, the operational coder control is required to be as low-complex that it can be applied in real-time applications.

A widely accepted approach for rate-distortion optimized encoding is the Lagrangian bit-allocation technique. The popularity of this approach is due to its effectiveness and simplicity. Given a fixed quantization parameter QP for a macroblock, the macroblock mode as well as associated block modes and motion vectors are determined by minimizing a Lagrangian cost functional D+λ(QP)·R, in which a distortion measure D is weighted against a rate term R using a Lagrangian multiplier λ. The Lagrangian multiplier λ depends only on the given macroblock quantization parameter QP. This Lagrangian coder control was successfully applied to H.263, MPEG-4 Visual and H.264/AVC by the authors [5,6,7,8]. In all cases, this improved encoding strategy provides visible performance gains compared to previous encoding strategies of H.263, MPEG-4 Visual and H.264/AVC, respectively, when the video source was coded using a fixed quantization parameter QP.

A simple and efficient macroblock rate-control algorithm for operating block-based hybrid video coders was presented in [9]. Given the target number of bits for a picture and the prediction error signals of all macroblocks inside this picture, the macroblock quantization parameters QP are adjusted in a way that the target number of bits is hit quite accurately while the distortion of the picture is minimized.

The combination of these two coder control strategies, the Lagrangian bit-allocation technique and the macroblock-based rate control algorithm, is not straightforward since the following interdependencies have to be taken into consideration:

-   -   For the rate control, the determination of the macroblock         quantization parameters QP depends on the residual signal and         thus on the estimated motion vectors as well as the chosen         macroblock and block coding modes.     -   For the Lagrangian optimization, the motion estimation and         macroblock/block mode decision are based on a minimization of a         Lagrangian cost function in which a distortion measure is         weighted against a rate term using a Lagrange multiplier λ.         Since the Lagrange multiplier λ(QP) is a function of the         quantization parameter QP, the residual signal also depends on         the quantization parameter.

Furthermore, the performance of the operational coder control must always be seen in conjunction with complexity considerations including the avoidance of multiple encodings as the above two items suggest due to the interdependency of the parameters QP and λ.

Thus, the technical problem to be solved by the invention is to provide a process and an arrangement for encoding video pictures, as well as an appropriate computer program and an appropriate storage medium, which provide the rate-distortion-efficiency of the Lagrangian bit-allocation technique as well as the rate-control property.

This task assignment is solved according to the invention by means of the features of the Claims 1, 10, 11 and 12.

The present invention solves this problem in such a way that a pre-analysis of pictures is carried out, wherein for at least a part of the macroblocks at least one of the control-parameters which assist the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step.

An arrangement for encoding video pictures comprises at least one chip and/or processor, wherein chip and/or processor is (are) installed in such a manner, that a process for encoding video pictures can be executed in a manner so that a pre-analysis of pictures is carried out, wherein for at least a part of the macroblocks at least one of the control-parameters which assist the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step.

In some cases the process for encoding video pictures is advantageously carried out by a computer program. Such a computer program enables a computer to run a process for encoding video pictures after it has been stored into the computers memory, wherein the computer program contains programming code for executing a process of encoding video pictures, where a pre-analysis of pictures is carried out, wherein for at least a part of the macroblocks at least one of the control-parameters which assist the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step.

For instance such a computer program can be made available (free of charge or for a fee) as downloadable data file in a communication network or a network for data transfer. Computer programs made available in this manner can be acquired by a process in that a computer program as described in claim 11 is downloaded from a network for data transfer, for example the internet, to a data processing unit, that is connected to said network.

For carrying out an encoding for video pictures advantageously a computer-readable storage medium is used, on which a program is stored, that enables the computer to execute a process for encoding video pictures, after it has been stored in the memory of the computer, wherein the computer program contains programming code for executing a process for encoding video pictures, where a pre-analysis of pictures is carried out, wherein for at least a part of the macroblocks at least one of the control-parameters which assist the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step.

In a preferred embodiment of the invention an energy measure of the residual signal of a macroblock representing the difference between the original macroblocks samples and their prediction is used as control parameter, which is determined based on at least one estimated parameter in the pre-analysis step

In an other preferred embodiment of the invention the energy measure of the residual signal that is used as control parameter is calculated as the average of the variances of the residual signals of the luminance and chrominance blocks inside a macroblocks that are used for transform coding according to:

$\sigma_{i}^{2} = {\frac{1}{N_{B} \cdot N_{P}}{\sum\limits_{j = 1}^{N_{B}}\;{\sum\limits_{k = 1}^{N_{P}}\;\left( {{d_{i,j}(k)} - \overset{\_}{d_{i,j}}} \right)^{2}}}}$ where N_(B) and N_(P) are the number of blocks (luminance and chrominance) used for transform coding inside a macroblock and the number of samples inside such a block, respectively, d_(i,j) is the residual signal of the block j inside the macroblock i, and d_(i,j) represents the average of the d_(i,j).

In a further preferred embodiment of the invention the prediction signal of macroblocks in intra coded pictures is assumed to consist of samples with the value of zero, and thus the residual signal corresponds to the original macroblock samples, for predictive coded pictures, the prediction signal of macroblocks used for determining the control parameters is estimated by motion compensated prediction using one or more displacement vectors and reference indices that are estimated in the pre-analysis step.

In a further preferred embodiment of the invention the pre-analysis step includes the estimation of displacement vectors m and reference indices r by minimizing the Lagrangian cost function

${\left\lbrack {\hat{m},\hat{r}} \right\rbrack = {\underset{m,r}{\arg\;\min}\left\{ {{D_{DFD}\left( {m,r} \right)} + {\lambda_{motion} \cdot {R_{MV}\left( {m,r} \right)}}} \right\}}},{where}$ ${D_{DFD}\left( {m,r} \right)} = {\sum\limits_{{({x,y})} \in B}\;{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}}$ determines the distortion term, s( . . . , t) and s′( . . . , t_(r)) represent the array of luminance samples of the original picture and the decoded reference picture given by the reference index r, respectively, R_(MV)(m,r) specifies the number of bits needed to transmit all components of the displacement vector [m_(x),m_(y)]^(T) and the reference index r, B is the area of the macroblock, macroblock partition, or sub-macroblock partition for which the displacement vector and the reference index are estimated, and λ_(motion)≧0 is the Lagrangian multiplier.

In a further preferred embodiment of the invention the Lagrangian multiplier λ_(motion) used for displacement vector estimation in the pre-analysis step is set in accordance with λ_(motion)=√{square root over (0.85· QP ²)}for H.263,MPEG-4 or λ_(motion)=√{square root over (0.85·2^(( QP −12)/3))}for H.264/AVC where QP represents the average quantization parameter of the last encoded picture of the same picture type.

In a further preferred embodiment of the invention the displacement vector estimation in the pre-analysis step is done for the entire macroblock covering an area of 16×16 luminance samples, and the reference index r is not estimated but determined in a way that it refers to the temporally closest reference picture that is stored in the decoded picture buffer.

Using the invention it is possible to perform an operational coder control for block-based hybrid video codecs that provides the rate-distortion efficiency of rate-distortion optimized encoders [5,6,7,8] as well as an accurate rate control suitable for low-delay interactive applications. The simulation results for MPEG-4 Visual and H.264/AVC show that the proposed encoding strategy achieves virtually the same rate-distortion performance as the rate-distortion optimized encoders without rate control.

The invention concerns the operational control of the macroblock layer. It is assumed that a given global rate control sets the target number of bits for a picture so that the conditions on transmission rate and decoding delay are kept. The operational control of the macroblock layer determines the quantization parameters, macroblock and block modes, motion vectors, and quantized transform coefficients in a way that this target number of bits a hit as accurately as possible while the distortion of the picture is minimized.

BRIEF DESCRIPTION OF THE DRAWINGS

The following examples are provided to describe the invention in further detail. These examples are intended to illustrate and not to limit the invention.

FIG. 1 Rate-distortion performance of our proposed encoding strategy (points) in comparison to the rate-distortion optimized encoding strategy [8] without rate control (solid line) for the Foreman sequence (QCIF, 10 pictures per second),

FIG. 2 Rate-distortion performance of our proposed encoding strategy (points) in comparison to the rate-distortion optimized encoding strategy [8] without rate control (solid line) for the Tempete sequence (CIF, 30 pictures per second),

FIG. 3 Obtained average bit-rates in comparison to the target bit-rate for the proposed encoding strategy.

DETAILED DESCRIPTION OF THE INVENTION

In the following sections this invention is described by an operational coder control using the inventional encoding process for video pictures. The described operational coder control combines the advantages of both approaches, the rate-distortion efficiency of Lagrangian bit-allocation technique as well as the rate-control property of [9].

In [5], it has been observed that the Lagrangian motion estimation has a very minor impact on the rate distortion performance when being used for H.263 baseline coding. This is because the bit-rate consumed by the motion vectors assigned to 16×16 blocks is very small and the impact of an unsuitable choice of λ for the motion estimation process is very small. Therefore, in our new encoding strategy the determination of the macroblock quantization parameters QP is based on an initial estimation of the residual signal using 16×16 blocks only. For that, the Lagrangian parameter λ is set employing the average quantization parameter QP of the last encoded picture of the same picture type. The quantization parameters QP (and thus the corresponding Lagrangian parameters λ) are selected similar to the approach of [9] using the estimated prediction error signals and the remaining bit budget. Based on these parameters, the motion vectors as well as the macroblock and block modes are chosen by minimizing the corresponding Lagrangian cost functions.

Since the subject of our invention is the suitable combination of two approaches concerning the operational control of the macroblock layer, the whole operational control algorithm is briefly described to avoid a misunderstanding of the concept. The main contribution is a simple low-cost solution of the interdependence problem between the Lagrangian bit-allocation technique and the rate control approach. This problem is solved by introducing a low-cost pre-analysis/pre-estimation step using only 16×16 blocks and a single reference picture. As a consequence, some algorithmic details of the rate control approach in [9] had to be adapted.

In the following sections 1 and 2 the whole algorithm of the operational control of the macroblock layer is described. Experimental results comparing the performance of the proposed algorithm with the constant slope approach only using Lagrangian optimization are given in section 3.

1. Initialization of Macroblock Layer Operational Control

The target number of bits for a picture R_(total) is set by a global rate control algorithm. The bit budget R_(B) for transmitting the macroblock-layer syntax elements of that picture is initialized as R _(B) =R _(total) −R _(header),  (1) where R_(header) represents the average number of bits needed for encoding the picture and/or slice header information of the given picture type.

For predictive coded pictures, an initial motion estimation step for 16×16 blocks and the temporally closest reference picture is performed for all macroblocks i of the picture. The corresponding initial motion vectors {circumflex over (m)}_(i) are obtained by minimizing the Lagrangian cost function

$\begin{matrix} {{\hat{m}}_{i} = {\underset{m \in M}{\arg\;\min}\left\{ {{D_{DFD}\left( {i,m} \right)} + {\lambda_{motion} \cdot {R_{MV}\left( {i,m} \right)}}} \right\}}} & (2) \end{matrix}$ with the distortion term being given as

$\begin{matrix} {{D_{DFD}\left( {i,m} \right)} = {\sum\limits_{{({x,y})} \in B_{i}}\;{{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},{t - {\Delta\; t}}} \right)}}}.}}} & (3) \end{matrix}$ s( . . . , t) and s′( . . . , t−Δt) represent the luminance signals of the original picture and the decoded reference picture, respectively. R_(MV)(i,m) specifies the number of bits needed to transmit all components of the motion vector [m_(x),m_(y)]^(T), M is the motion vector search range, and B_(i) represents the area of the i-th macroblock.

For this initial estimation step, the Lagrangian multiplier λ_(motion) is set using the average quantization parameter QP of the last encoded picture of the same picture type: H.263,MPEG-4: λ_(motion)=√{square root over (0.85· QP ²)}  (4) JVT/H.264:λ_(motion)=√{square root over (0.85·2^(( QP −12)/3))}  (5)

Based on this initial estimation or on the original source data (for intra pictures), a variance measure σ_(i) ² is calculated for each macroblock according to this invention

$\begin{matrix} {\sigma_{i}^{2} = {\frac{1}{N_{B} \cdot N_{P}}{\sum\limits_{j = 1}^{N_{B}}\;{\sum\limits_{k = 1}^{N_{P}}\;{\left( {{d_{i,j}(k)} - \overset{\_}{d_{i,j}}} \right)^{2}.}}}}} & (6) \end{matrix}$

N_(B) and N_(P) are the number of blocks (luminance and chrominance) used for transform coding inside a macroblock and the number of samples inside such a block, respectively. d_(i,j) represents the residual signal of the block j inside the macroblock i, its average is denoted by d_(i,j) . For intra pictures, this residual signal corresponds to the original macroblock samples, for predictive coded pictures, it represents the prediction error signal.

Based on the variance measures, a weighting factor α_(i) is assigned to each macroblock i according to (see [9])

$\begin{matrix} {\alpha_{i} = \left\{ {\begin{matrix} {{2 \cdot {R_{B}/N} \cdot \left( {1 - \sigma_{i}} \right)} + {\sigma_{i}:}} & {{R_{B}/N} < 0.5} \\ {1:} & {otherwise} \end{matrix},} \right.} & (7) \end{matrix}$ where N is the number of macroblocks inside the picture.

The following parameters are set to their initial values [9]:

-   -   remaining complexity measure:

$S_{1} = {\sum\limits_{i = 1}^{N}\;{\alpha_{i} \cdot \sigma_{i}}}$

-   -   remaining macroblocks: N₁=N     -   remaining bit budget: B₁=R_(B)     -   model parameters: K₁=K_(N) (last picture of same type)         -   C₁=C_(N) (last picture of same type)         -   j_(K)=0

For the first picture of a sequence, the model parameters K₁ and C₁ are set to some predefined values.

2. Operational Control of the Macroblock Layer

2.1. Target Quantization Parameter Setting

The target quantization step size Q_(i)* for the i-th macroblock is set according (see [9]) to

$\begin{matrix} {Q_{i}^{*} = \left\{ \begin{matrix} {{\max\left( {Q_{\min},{\min\left( {Q_{\max},\sqrt{\frac{K_{i} \cdot \sigma_{i} \cdot S_{i}}{\alpha_{i} \cdot \left( {B_{i} - {N_{i} \cdot C_{i}}} \right)}}} \right)}} \right)}:} & {B_{i} > {N_{i} \cdot C_{i}}} \\ {Q_{\max}:} & {B_{i} \leq {N_{i} \cdot C_{i}}} \end{matrix} \right.} & (8) \end{matrix}$ where Q_(min) and Q_(max) are the minimum and maximum quantization step size supported by the syntax.

Based on the target quantization step size, the target quantization parameter QP_(i)* is set according (see [9]) to QP _(i)*=max(QP _(i−1) −ΔQP _(max),min(QP _(i−1) +ΔQP _(max) ,f _(Q)(Q _(i)*))),  (9) where QP_(i−1) is the quantization parameter of the last macroblock and ΔQP_(max) is the maximum allowed quantizer changing (given by the syntax or user-defined). The function f_(Q)( . . . ) specifies the mapping of quantization step sizes onto quantization parameters; it depends on the underlying syntax. 2.2. Macroblock Motion Estimation and Mode Decision

The Lagrangian multipliers used for motion estimation and mode decision of the macroblock i are set according to [5] based on the chosen target quantization parameter as follows: H.263,MPEG-4:(λ_(motion,i))²=λ_(mod e,i)=0.85·QP _(i)*²  (10) H.264/AVC:(λ_(motion,i))²=λ_(mod e,i)=0.85·2^((QP* _(i)−12)/3)  (11)

For all motion-compensated macroblock/block modes the associated motion vectors m_(i) and reference indices r_(i) (H.263 Annex U and H.264/AVC) are obtained by minimizing the Lagrangian functional (cf. (2))

$\begin{matrix} {\left\lbrack {m_{i},r_{i}} \right\rbrack = {\underset{{m \in M},{r \in R}}{\arg\;\min}\left\{ {{D_{DFD}\left( {i,m,r} \right)} + {\lambda_{{motion},i} \cdot {R_{MV}\left( {i,m,r} \right)}}} \right\}}} & (12) \end{matrix}$ with the distortion term being given as

$\begin{matrix} {{D_{DFD}\left( {i,m,r} \right)} = {\sum\limits_{{({x,y})} \in B_{i}}\;{{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}.}}} & (13) \end{matrix}$

At this, R denotes the set of reference pictures stored in the decoded picture buffer, M specifies the motion vectors search range inside a reference picture, t_(r) is the sampling time of a reference picture referred by the reference index r, s( . . . , t) and s′( . . . , t_(r)) represent the luminance signals of the original picture and the decoded reference picture, respectively, and R_(MV)(i,m,r) specifies the number of bits needed to transmit all components of the motion vector m=[m_(x),m_(y)]^(T) as well as the reference index r.

The determination of the macroblock (or block) modes for given macroblock (block) follows basically the same approach. From a given set of possible macroblock/block modes S_(mode), the mode p_(i) that minimizes the following Lagrangian cost function is chosen

$\begin{matrix} {p_{i} = {\underset{p \in S_{{mod}\mspace{11mu} e}}{\arg\;\min}{\left\{ {{D_{REC}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)} + {\lambda_{{mod}\mspace{11mu} e} \cdot {R_{all}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)}}} \right\}.}}} & (14) \end{matrix}$

The distortion measure represents the sum of squared differences between the original macroblock/block samples s and the reconstructed samples s′

$\begin{matrix} {{{D_{REC}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)} = {\sum\limits_{{({x,y})} \in B}\;\left( {{s\left( {x,y} \right)} - {s^{\prime}\left( {x,\left. y \middle| p \right.,{QP}_{i}^{*}} \right)}} \right)^{2}}},} & (15) \end{matrix}$ where B specifies the set of corresponding macroblock/block samples. R_(all)(i,p|QP_(i)*) is the number of bits associated with choosing the mode p and the quantization parameter QP_(i)*, it includes the bits for the macroblock header, the motion vectors and reference indices as well as the quantized transform coefficients of all luminance and chrominance blocks. 2.3. Final Setting of the Quantization Parameter

The quantization parameter QP_(i) used for transmitting the macroblock syntax elements depends on the chosen macroblock mode and its associated parameters as quantized transform coefficients. If the syntax allows a quantizer changing for the chosen macroblock parameters a quantization parameter of QP_(i)=QP_(i)* is chosen, otherwise the quantization parameter from the last macroblock is taken: QP_(i)=QP_(i−1).

2.4. Model Update for the Operational Macroblock Layer Control

After the encoding of a macroblock is finished, the model parameters of the operational coder control are updated. In a first step, the so-called macroblock parameters K_(MB) and C_(MB) are calculated according to this invention K _(MB) =Q _(i)** ·(R _(all,i) −R _(MV)({circumflex over (m)} _(i)))/(σ_(i) ²  (16) C _(MB) =R _(MV)({circumflex over (m)} _(i))  (17) where Q_(i)** denotes the quantization step size that corresponds to the target quantization parameter QP_(i)*: Q_(i)**=f_(Q) ⁻¹(QP_(i)*). R_(all) is the number of bits used for encoding the considered macroblock including all syntax elements, and R_(MV)({circumflex over (m)}_(i)) is the number of bits associated with the motion vector {circumflex over (m)}_(i), which has been estimated in the initialization step (sec. 1). The average model parameters of the currently encoded picture, K_(F) and C_(F), are set according to (see [9])

$\begin{matrix} {{{C_{F} = {{C_{F} \cdot {\left( {i - 1} \right)/i}} + {C_{MB}/i}}}{if}\mspace{14mu}\left( {K_{MB} > {0\mspace{14mu}{and}\mspace{14mu} K_{MB}} < 1000} \right)}\{} & (18) \\ {j_{K} = {j_{K} + 1}} & (19) \\ \left. {K_{F} = {{K_{F} \cdot {\left( {j_{K} - 1} \right)/j_{K}}} + {K_{MB}/j_{K}}}} \right\} & (20) \end{matrix}$

Based on these parameters the model parameters used for encoding the following macroblock are updated as follows (see [9]):

-   -   remaining complexity measure: S_(i+1)=S_(i)−α_(i)·σ_(i)     -   remaining macroblocks: N_(i+1)=N_(i)−1     -   remaining bit budget: B_(i+1)=B_(i)−R_(all,i)     -   model parameters:         K _(i+1) =K _(F) ·i/N+K ₁·(N−i)/N  (21)         C _(i+1) =C _(F) ·i/N+C ₁·(N−i)/N  (22)         3. Experimental Results

The efficiency of our new encoding strategy is demonstrated for the H.264/AVC video coding standard by comparing it to the encoding strategy only using Lagrangian optimization (for a fixed value of the quantization parameter for the whole sequence). Both encoders use only one Intra picture at the beginning of the sequence, all following pictures are coded as predictive coded P-pictures. In both cases five reference pictures are used. The motion estimation is done by a logarithmic integer-pixel search over the range of [−32 . . . 32]×[−32 . . . 32] samples and a subsequent half- and quarter-pixel refinement. The entropy coding is done using context-adaptive binary arithmetic coding (CABAC).

For our new encoding strategy the following simple global rate control technique was used. Given the number N of pictures to be encoded, the target average bit-rate R in kbit/sec, and the picture rate F in Hz, the target number of bits B₁* for the first Intra picture i=1 is determined by

$B_{1}^{*} = {\frac{6000 \cdot N \cdot R}{F \cdot \left( {N + 5} \right)}.}$

For all remaining P-pictures i>1, the target bit budget is set to

$B_{i}^{*} = {\frac{1}{N - i + 1} \cdot \left( {\frac{1000 \cdot N \cdot R}{F} - {\sum\limits_{k = 1}^{i - 1}\; B_{k}}} \right)}$ where B_(k) denotes the number of bits actually consumed by the k-th picture.

In the FIGS. 1 and 2, the rate-distortion performance of both encoders is compared for two test sequences with different characteristics. The curves show the average PSNR of the luminance component versus average bit-rate measured of the complete bit-stream. It can be seen, that our proposed encoding strategy provides virtually the same rate-distortion efficiency as the rate-distortion optimized encoder without rate control [8] while the target bit-rate is accurately hit.

The obtained average bit-rates for our proposed encoder are shown in Table 1 together with the target bit-rates.

REFERENCES

-   [1] ITU-T and ISO/IEC JTC1, “Generic coding of moving pictures and     associated audio information—Part 2: Video,” ITU-T Recommendation     H.262—ISO/IEC 13818-2 (MPEG-2), November 1994. -   [2] ITU-T, “Video coding for low bitrate communication,” ITU-T     Recommendation H.263; version 1, November 1995; version 2, January     1998. -   [3] ISO/IEC JTC1, “Coding of audio-visual objects—Part 2: Visual,”     ISO/IEC 14496-2 (MPEG-4 visual version 1), April 1999; Amendment 1     (version 2), February 2000. -   [4] T. Wiegand, G. Sullivan, A. Luthra, “Draft ITU-T Recommendation     and Final Draft International Standard of Joint Video Specification     (ITU-T Rec. H.264 ISO/IEC 14496-10 AVC),” Joint Video Team (JVT) of     ISO/IEC MPEG and ITU-T VCEG, JVT-G050r1, May 2003. -   [5] T. Wiegand, B. D. Andrews, “An Improved H.263 Coder Using     Rate-Distortion Optimization,” Doc. ITU-T/SG17/Q15-D-13, April 1998. -   [6] G. J. Sullivan, T. Wiegand, “Rate-Distortion Optimization for     Video Compression,” in IEEE Signal Processing Magazine, vol. 15, no.     6, pp. 74-90, November 1998. -   [7] H. Schwarz, T. Wiegand, “An Improved MPEG-4 Coder Using     Lagrangian Coder Control,” ITU-T/SG16/Q6/VCEG-M49, April 2001,     Austin, Tex., USA. -   [8] H. Schwarz, T. Wiegand, “An Improved H.26L Coder Using     Lagrangian Coder Control,” ITU-T/SG16/Q6/VCEG-D146, June 2001, Porto     Seguro, Brazil. -   [9] J. Ribas-Corbera, S. Lei, “Rate Control in DCT Video Coding for     Low-Delay Communications,” in IEEE Transactions on Circuit and     Systems for Video Technology, vol. 9, no. 1, February 1999. 

1. Process for encoding video pictures in a video coder, wherein a pre-analysis of pictures is carried out, wherein for at least a part of macroblocks at least one control-parameter which assists the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded on macroblock level with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step, wherein the macroblock encoding process comprises the following steps given a target quantization parameter QP_(i)*: calculating Lagrangian multipliers used for motion estimation and mode decision of a macroblock i according to: (λ_(motion,i))²=λ_(mode,i)=0.85·QP _(i)*² for H.263,MPEG-4, or (λ_(motion,i))²=λ_(mode,i)=0.85·2^((QP* _(i)−12)/3)for H.264/AVC, for all motion-compensated macroblock/block modes determining associated motion vectors m_(i) and a reference index r_(i) by minimizing a Lagrangian functional ${\left\lbrack {m_{i},r_{i}} \right\rbrack = {\underset{{m\; \in \; M},{r\; \in \; R}}{\arg\;\min}\left\{ {{D_{DFD}\left( {i,m,r} \right)} + {\lambda_{{motion},i} \cdot {R_{MV}\left( {i,m,r} \right)}}} \right\}}},$ with a distortion term being given as ${{D_{DFD}\left( {i,m,r} \right)} = {\sum\limits_{{({x,y})} \in \; B_{i}}^{\;}\;{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}}},$ where s( . . . , t) and s′( . . . , t_(r)) represent an array of luminance samples of an original picture and a decoded reference picture given by a reference index r, respectively, R denotes a set of reference pictures stored in the decoded picture buffer, M specifies a motion vectors search range inside a reference picture, t_(r) is a sampling time of a reference picture referred by the reference index r, B is the area of a corresponding block or macroblock, and R_(MV)(i,m,r) specifies a number of bits needed to transmit all components of a motion vector m=[m_(x),m_(y)]^(T) as well as the reference index r; determining macroblock/block encoding modes p_(i) of a macroblock i by minimizing a Lagrangian cost function $p_{i} = {\underset{p\; \in \; S_{{mod}\; e}}{\arg\;\min}\left\{ {{D_{REC}\left( {i,{p❘{QP}_{i}^{*}}} \right)} + {\lambda_{{mod}\; e} \cdot {R_{all}\left( {i,{p❘{QP}_{i}^{*}}} \right)}}} \right\}}$ where D_(REC)(i,p|QP_(i)*) is defined as ${D_{REC}\left( {i,{p❘{QP}_{i}^{*}}} \right)} = {\sum\limits_{{({x,y})}\; \in \; B}^{\;}\;\left( {{s\left( {x,y} \right)} - {s^{\prime}\left( {x,{y❘p},{QP}_{i}^{*}} \right)}} \right)^{2}}$ s( . . . ) and s′( . . . ) represent an array of original macroblock samples and their reconstruction, respectively, B specifies a set of corresponding macroblock/block samples, R_(all)(i,p|QP_(i)*) is a number of bits associated with choosing a mode p and quantization parameter QP_(i)*, including bits for the macroblock/block modes, motion vectors and reference indices as well as quantized transform coefficients of all luminance and chrominance blocks, and S_(mode) is a given set of possible macroblock/block modes.
 2. Process according to claim 1, wherein an energy measure of a residual signal of a macroblock representing the difference between an original macroblocks samples and their prediction is used as control parameter, which is determined based on at least one estimated parameter in the pre-analysis step.
 3. Process according to claim 2, wherein the energy measure of the residual signal is calculated as the average of variances of the residual signals of luminance and chrominance blocks inside a macroblocks i that are used for transform coding according to: $\sigma_{i}^{2} = {\frac{1}{N_{B} \cdot N_{P}}{\sum\limits_{j = 1}^{N_{B}}\;{\sum\limits_{k = 1}^{N_{P}}\;\left( {{d_{i,j}(k)} - \overset{\_}{d_{i,j}}} \right)^{2}}}}$ where N_(B) and N_(p) are the number of blocks (luminance and chrominance) used for transform coding inside a macroblock and the number of samples inside such a block, respectively, d_(i,j) is the residual signal of the block j inside the macroblock i, and d_(i,j) represents the average of the d_(i,j).
 4. Process according to claim 1, wherein for predictive coded pictures, the prediction signal of macroblocks used for determining the control parameters is estimated by motion compensated prediction using one or more displacement vectors and reference indices that are estimated in the pre-analysis step.
 5. Process according to claim 1, wherein the pre-analysis step includes an estimation of displacement vectors m and reference index r by minimizing a Lagrangian cost function ${\left\lbrack {\hat{m},r} \right\rbrack = {\underset{m,r}{\arg\;\min}\left\{ {{D_{DFD}\left( {m,r} \right)} + {\lambda_{motion} \cdot {R_{MV}\left( {m,r} \right)}}} \right\}}},{where}$ ${D_{DFD}\left( {m,r} \right)} = {\sum\limits_{{({x,y})} \in B}\;{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}}$ determines a distortion term, s( . . . , t) and s′( . . . , t_(r)) represent an array of luminance samples of an original picture and a decoded reference picture given by the reference index r, respectively, R_(MV)(m,r) specifies a number of bits needed to transmit all components of a displacement vector [m_(x),m_(y)]^(T) and the reference index r, B is the area of a macroblock, macroblock partition, or sub-macroblock partition for which the displacement vector and the reference index are estimated, and λ_(motion)≧0 is the Lagrangian multiplier.
 6. Process according to claim 5, wherein the Lagrangian multiplier λ_(motion) used for displacement vector estimation in the pre-analysis step is set in accordance with λ_(motion)=√{square root over (0.85· QP ²)}for H.263,MPEG-4 or λ_(motion)=√{square root over (0.85·2^(( QP −12)/3))}for H.264/AVC where QP represents an average quantization parameter of a last encoded picture of a same picture type.
 7. Process according to claim 1, wherein a displacement vector estimation in the pre-analysis step is done for the entire macroblock covering an area of 16×16 luminance samples, and the reference index r is not estimated but determined in a way that it refers to the temporally closest reference picture that is stored in the decoded picture buffer.
 8. Process according to claim 1, wherein during the encoding process for each macroblock i a target quantization parameter QP_(i)* is determined in dependence of the control parameters estimated in the pre-analysis step.
 9. Arrangement with at least one chip and/or processor that is (are) installed in such a manner, that a process for encoding video pictures can be executed in a manner so that a pre-analysis of pictures is carried out, wherein for at least a part of macroblocks at least one control-parameter which assists the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step wherein the macroblock encoding process comprises the following steps given a target quantization parameter QP_(i)*: calculating Lagrangian multipliers used for motion estimation and mode decision of a macroblock i according to: (λ_(motion,i))²=λ_(mode,i)=0.85·QP _(i)*² for H.263,MPEG-4, or (λ_(motion,i))²=λ_(mode,i)=0.85·2^((QP* _(i)−12)/3)for H.264/AVC, for all motion-compensated macroblock/block modes determining associated motion vectors m_(i) and a reference index r_(i) by minimizing a Lagrangian functional ${\left\lbrack {m_{i},r_{i}} \right\rbrack = {\underset{{m \in M},{r \in R}}{\arg\;\min}\left\{ {{D_{DFD}\left( {i,m,r} \right)} + {\lambda_{{motion},i} \cdot {R_{MV}\left( {i,m,r} \right)}}} \right\}}},$ with a distortion term being given as ${{D_{DFD}\left( {i,m,r} \right)} = {\sum\limits_{{({x,y})} \in B_{i}}\;{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}}},$ where s( . . . , t) and s′( . . . , t_(r)) an array of luminance samples of an original picture and a decoded reference picture given by a reference index r, respectively, R denotes a set of reference pictures stored in the decoded picture buffer, M specifies a motion vectors search range inside a reference picture, t_(r) is a sampling time of a reference picture referred by the reference index r, B is the area of a corresponding block or macroblock, and R_(MV)(i,m,r) specifies a number of bits needed to transmit all components of a motion vector m=[m_(x),m_(y)]^(T) as well as the reference index r; determining macroblock/block encoding modes p_(i) of a macroblock i by minimizing a Lagrangian cost function $p_{i} = {\underset{p \in S_{{mod}\mspace{11mu} e}}{\arg\;\min}\left\{ {{D_{REC}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)} + {\lambda_{{mod}\mspace{11mu} e} \cdot {R_{all}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)}}} \right\}}$ where D_(REC)(i,p|QP_(i)*) is defined as ${D_{REC}\left( {i,\left. p \middle| {QP}_{i}^{*} \right.} \right)} = {\sum\limits_{{({x,y})} \in B}\;\left( {{s\left( {x,y} \right)} - {s^{\prime}\left( {x,\left. y \middle| p \right.,{QP}_{i}^{*}} \right)}} \right)^{2}}$ s( . . . ) and s′( . . . ) represent an array of original macroblock samples and their reconstruction, respectively, B specifies a set of corresponding macroblock/block samples, R_(all)(i,p|QP_(i)*) is a number of bits associated with choosing a mode p and quantization parameter QP_(i)*, including bits for the macroblock/block modes, motion vectors and reference indices as well as quantized transform coefficients of all luminance and chrominance blocks, and S_(mode) is a given set of possible macroblock/block modes.
 10. Computer program embodied on a non-transitory computer readable storage medium within a computer that enables the computer to run a process for encoding video pictures, where a pre-analysis of pictures is carried out, wherein for at least a part of macroblocks at least one control-parameter which assists the encoding process is determined based on at least one estimated parameter, in a second step the picture is encoded with encoding-parameters calculated based on the control-parameters determined in the pre-analysis step wherein the macroblock encoding process comprises the following steps given a target quantization parameter QP_(i)*: calculating Lagrangian multipliers used for motion estimation and mode decision of a macroblock i according to: (λ_(motion,i))²=λ_(mode,i)=0.85·QP _(i)*² for H.263,MPEG-4, or (λ_(motion,i))²=λ_(mode,i)=0.85·2^((QP* _(i)−12)/3)for H.264/AVC, for all motion-compensated macroblock/block modes determining associated motion vectors m_(i) and a reference index r_(i) by minimizing a Lagrangian functional ${\left\lbrack {m_{i},r_{i}} \right\rbrack = {\underset{{m\; \in \; M},{r\; \in \; R}}{\arg\;\min}\left\{ {{D_{DFD}\left( {i,m,r} \right)} + {\lambda_{{motion},i} \cdot {R_{MV}\left( {i,m,r} \right)}}} \right\}}},$ with a distortion term being given as ${{D_{DFD}\left( {i,m,r} \right)} = {\sum\limits_{{({x,y})}\; \in \; B_{i}}^{\;}\;{{{s\left( {x,y,t} \right)} - {s^{\prime}\left( {{x - m_{x}},{y - m_{y}},t_{r}} \right)}}}}},$ where s( . . . , t) and s′( . . . , t_(r))represent an array of luminance samples of an original picture and a decoded reference picture given by a reference index r, respectively, R denotes a set of reference pictures stored in the decoded picture buffer, M specifies a motion vectors search range inside a reference picture, t_(r) is a sampling time of a reference picture referred by the reference index r, B is the area of a corresponding block or macroblock, and R_(MV)(i,m,r) specifies a number of bits needed to transmit all components of a motion vector m=[m_(x),m_(y)]^(T) as well as the reference index r; determining macroblock/block encoding modes p_(i) of a macroblock i by minimizing a Lagrangian cost function $p_{i} = {\underset{p\; \in \; S_{{mod}\; e}}{\arg\;\min}\left\{ {{D_{REC}\left( {i,{p❘{QP}_{i}^{*}}} \right)} + {\lambda_{{mod}\; e} \cdot {R_{all}\left( {i,{p❘{QP}_{i}^{*}}} \right)}}} \right\}}$ where D_(REC)(i,p|QP_(i)*) is defined as ${D_{REC}\left( {i,{p❘{QP}_{i}^{*}}} \right)} = {\sum\limits_{{({x,y})} \in \; B}^{\;}\;\left( {{s\left( {x,y} \right)} - {s^{\prime}\left( {x,{y❘p},{QP}_{i}^{*}} \right)}} \right)^{2}}$ s( . . . ) and s′( . . . ) represent an array of original macroblock samples and their reconstruction, respectively, B specifies a set of corresponding macroblock/block samples, R_(all)(i,p|QP_(i)*) is a number of bits associated with choosing a mode p and quantization parameter QP_(i)*, including bits for the macroblock/block modes, motion vectors and reference indices as well as quantized transform coefficients of all luminance and chrominance blocks, and S_(mode) is a given set of possible macroblock/block modes.
 11. A non-transitory computer-readable storage medium, on which an executable program is stored, that enables a computer to execute a process for encoding video pictures according to claim
 1. 12. Process in that a computer program as described in claim 10 is downloaded from a network for data transfer to a data processing unit, that is connected to said network. 